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Abstract 

The complete genome sequence of the first equine coronavirus (ECoV) isolate, NC99 strain was accomplished by directly sequencing 11 
overlapping fragments which were RT-PCR amplified from viral RNA. The ECoV genome is 30,992 nucleotides in length, excluding the polyA tail. 
Analysis of the sequence identified 11 open reading frames which encode two replicase polyproteins, five structural proteins (hemagglutinin esterase, 
spike, envelope, membrane, and nucleocapsid) and four accessory proteins (NS2, p4.7, pl2.7, and I). The two replicase polyproteins are predicted to 
be proteolytically processed by three virus-encoded proteases into 16 non-structural proteins (nspl-16). The ECoV nsp3 protein had considerable 
amino acid deletions and insertions compared to the nsp3 proteins of bovine coronavirus, human coronavirus OC43, and porcine hemagglutinating 
encephalomyelitis virus, three group 2 coronaviruses phylogenetically most closely related to ECoV. The structure of subgenomic mRNAs was 
analyzed by Northern blot analysis and sequencing of the leader-body junction in each sg mRNA. 

© 2007 Elsevier Inc. All rights reserved. 
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Introduction 

Coronaviruses are mainly associated with respiratory and 
gastrointestinal disease in humans (Drosten et ah, 2003; Holmes, 
2001; Ksiazek et ah, 2003; Peiris et al., 2003; vander Hoek et ah, 
2004; Woo et al., 2005) and respiratory, enteric, neurological, or 
hepatic disease in animals (Holmes, 2001). Coronaviruses have 
also been isolated from bats, poultry and other birds (Cavanagh, 
2005; Chu et al., 2006; Poon et al., 2005; Ren et al., 2006). On 
the basis of antigenic and genetic analyses, coronaviruses are 
divided into three groups (Gonzalez et al., 2003; Gorbalenya et 
al., 2004; Snijder et al., 2003). Group 1 viruses include human 
coronaviruses 229E (HCoV-229E) and NL63 (HCoV-NL63), 
canine coronavirus (CCoV), feline coronavirus (FCoV), porcine 
transmissible gastroenteritis virus (TGEV), porcine epidemic 
diarrhea virus (PEDV), and bat coronavirus. Group 2 viruses are 
subdivided into group 2a which includes murine hepatitis virus 
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(MHV), human coronaviruses OC43 (HCoV-OC43) and HKU1 
(HCoV-HKUl), bovine coronavirus (BCoV), porcine hemag¬ 
glutinating encephalomyelitis virus (PHEV), and rat coronavirus 
(RCov), and group 2b which includes SARS-coronavirus 
(SARS-CoV). Group 3 viruses include avian viruses, such as 
avian infectious bronchitis virus (IBV), and turkey coronavirus 
(TCoV). 

Members of the family Coronaviridae are enveloped, 
positive-stranded RNA viruses with exceptionally large, poly- 
cistronic genomes (27-32 kb). The 5'-proximal two-thirds of the 
genome comprises two open reading frames (ORFs), ORF1 a and 
ORFlb, which encode the replicase polyproteins (pp) la and 
pplab (Ziebuhr, 2005). Expression of the pplab requires a -1 
ribosomal frameshift during translation of the genomic RNA 
(Brierley et al., 1987). The two replicase polyproteins are pro¬ 
cessed extensively by two or three viral proteases encoded by 
ORF la to generate up to 16 end-products termed nonstructural 
proteins (nsp) 1 to 16 and multiple processing intermediates 
(Ziebuhr, 2005; Ziebuhr et al., 2000). The N-proximal region of 
the polyproteins is processed by one or two papain-like proteases 
(PL pro ), whereas the central and C-proximal region is processed 
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by the viral main protease, 3C-like protease (3CL pro ) (Ziebuhr, 
2005; Ziebuhr et al., 2000). The 3'-proximal one-third of the 
genome encodes structural proteins and various accessory 
proteins. Genes encoding the four structural proteins present in 
all coronaviruses occur in the 5' to 3' order as spike (S), envelope 
(E), membrane (M), and nucleocapsid (N) proteins (Brian and 
Baric, 2005; Lai et al., 2006). Some coronaviruses contain an 
additional structural protein, the hemagglutinin-esterase (HE) 
protein which is located upstream of the S protein gene (Lai et 
al., 2006). In contrast to the replicase proteins which are directly 
translated from the genomic RNA, coronavirus structural and 
accessory proteins are expressed from a nested set of 3' co¬ 
terminal subgenomic (sg) mRNAs that also possess a common 
5' leader sequence derived from the 5' end of the genome 
(Pasternak et al., 2006; Sawicki et al., 2007). The common 5' 
leader is fused to the 3' body segments through a mechanism that 
is presumed to involve discontinuous minus strand RNA 
synthesis to produce subgenome-length templates for subge¬ 
nomic mRNA synthesis, with the transcription regulatory se¬ 
quence (TRS) elements determining the fusion sites of leader 
and body segments (see recent review of Pasternak et al., 2006; 
Sawicki et al., 2007 for details). 

Equine coronavirus (ECoV) was first isolated from feces of a 
diarrheic foal in 1999 (ECoV-NC99) in North Carolina, USA 
(Guy et al., 2000). Little is known about ECoV and its clinical 
significance. Molecular characterization of ECoV and develop¬ 
ment of diagnostic and prophylactic reagents necessitate 
sequencing of ECoV. In this study, we determined the full- 
length nucleotide sequence of the ECoV-NC99 strain of equine 
coronavirus. The viral genome and proteome were analyzed and 
the predicted features of ECoV nonstructural, structural, and 
accessory proteins were compared to those of other corona¬ 
viruses. Synthesis of sg mRNAs in ECoV-infected cells was 
analyzed by Northern blotting. The leader-body junction 
sequence in each sg mRNA was determined and the exact 
position of TRS used for synthesis of each sg mRNA was 
mapped on the genome. The evolutionary relationship between 
ECoV and other phylogenetically closely related group 2a 
coronaviruses was explored. 

Results and discussion 

ECoV genome sequence analysis 

We report here the full-length genomic sequence of the first 
ECoV isolate, the NC99 strain, and this is also the first reported 
complete genome sequence of ECoV. The nucleotide sequence 
was determined by directly sequencing 11 overlapping cDNA 
fragments which were RT-PCR amplified from viral RNA. The 
ECoV-NC99 genome comprises 30,992 nucleotides (nt), 
excluding the 3' poly (A) tail, and has a GC content of 37.2%. 
The nucleotide sequence data have been deposited in GenBank 
under accession number EF446615. 

Both 5' and 3' ends of the ECoV genome contain short 
untranslated regions (UTR). The 5' UTR comprises 209 nt (1- 
209) and includes a potential short internal ORF of 8 codons (nt 
99-125). Four stem-loop structures (I, II, III, and IV) were 


identified in the 5' UTR and a short stretch of nucleotides that are 
part of the ORF la (see Supplementary Fig. SI). The bulged 
stem-loop III (96-115) and IV (189-208) closely resemble the 
stem-loop III and IV that have been identified as replication 
signaling elements in bovine coronavirus and other group 2 
coronaviruses (Raman and Brian, 2005; Raman et al., 2003; Wu 
et al., 2003). The 3' UTR of the ECoV genome comprises 289 nt 
(30,704-30,992) and contains a putative bulged stem-loop 
structure (nt 30,703-30,770) and a putative pseudoknot struc¬ 
ture (30,766-30,819) (see Supplementary Fig. S2). Similar 
putative bulged stem-loop structure and pseudoknot structure 
have been identified in murine hepatitis virus and other group 2 
coronaviruses; these have been shown to be essential for viral 
replication (Goebel et al., 2004a,b; Hsue and Masters, 1997; 
Hsue et al., 2000; Williams et al., 1999). 

Analysis of the ECoV-NC99 genome reveals 11 potential 
ORFs (la, lb, 2-8, 9a and 9b) as shown in Fig. 1 and Table 1. 
The ORFs la and lb encode the replicase polyproteins ppla 
and pplab. The ORFs 2-8, 9a and 9b encode structural and 
accessory proteins NS2, HE, S, p4.7, pi2.7, E, M, N, and I, 
respectively. 

The replicase ORF la (nt 210-13,499) and replicase ORF lb 
(13,478-21,595) occupy 21.4 kb (69%) of the ECoV-NC99 
genome. The translation of ORF 1 a generates a precursor pp 1 a of 
4,429 amino acids. Similar to other coronaviruses, translation of 
ORF lb involves a -1 ribosomal frameshift, generating a 7128- 
amino acid pplab. The ribosomal frameshift is assumed to be 
directed by two signals in the ORF la/ lb overlapping region: a 
slippery sequence 5'UUUAAAC3' (nt 13,472-13,478) and a 
predicted downstream RNA pseudoknot structure (nt 13,484- 
13,559) (see Supplementary Fig. S3). The ppla and pplab 
proteins are predicted to be proteolytically processed by viral- 
encoded proteases into 16 non-structural proteins (nspl-16, 
Table 2) required for viral replication and transcription. By 
comparison to other coronaviruses, a number of putative 
functional domains are predicted in the ECoV ppla and pplab 
and these are summarized in Fig. 1 and Table 2 (Gorbalenya et 
al., 1991, 2006; Snijder et al., 2003; Ziebuhr, 2005; Ziebuhr et 
al., 2001). Enzymatic activities of nsp3, nsp5, nspl2, nspl3, 
nspl4 and nspl5 have been experimentally confirmed for some 
coronaviruses (Barretto et al., 2005; Cheng et al., 2005; Guarino 
et al., 2005; Heusipp et al., 1997; Ivanov et al., 2004a,b; Ivanov 
and Ziebuhr, 2004; Lindner et al., 2005; Minskaia et al., 2006; 
Putics et al., 2005,2006; Seybert et al., 2000,2005; Tanner et al., 
2003; Ziebuhr, 2005; Ziebuhr et al., 2001). The 3CL pro (catalytic 
residues His-3333 and Cys-3437) is predicted to cleave the 
C-terminal half of the ECoV ppla and the ORF lb-encoded part 
of pplab. The putative PLl pro (catalytic residues Cys-1078 and 
His-1229) and PL2 pro (catalytic residues Cys-1675 and His- 
1832) are predicted to process the N-proximal regions of the 
ECoV ppla (Fig. 1 and Table 2). 

The most striking differences between the ECoV replicase 
and other group 2 coronaviruses replicases were identified in 
nsp3. The ECoV nsp3 protein has 3 aa deletions and 55 aa 
insertions compared to the nsp3 proteins of BCoV, HCoV-OC43, 
and PHEV, three viruses phylogenetically most closely related to 
ECoV. These insertions and deletions are clustered at two 
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Fig. 1. Schematic diagrams of ECoV genome organization. The ECoV entire genome organization is depicted (middle). The 5' leader, ORFs la and lb encoding 
replicase polyproteins are shown, with the ribosomal frameshift site indicated. Structural and accessory proteins are also indicated: NS2 protein (encoded by ORF2), 
hemagglutinin esterase (HE, ORF3), spike protein (S, ORF4), p4.7 protein (ORF5), pl2.7 protein (ORF6), envelope protein (E, ORF7), membrane protein (M, ORF8), 
nucleocapsid protein (N, ORF9a), and I protein (ORF9b). Predicted cleavage products (nspl-nspl6) of the replicase polyproteins are depicted (Bottom). Arrows 
represent sites in the corresponding replicase polyproteins that are cleaved by papain-like proteases (white arrows) or the 3C-like cysteine protease (black arrows). A 
number of putative functional domains predicted in the ECoV ppla and pplab are indicated. PL1, papain-like proteinase 1 (aa 1059-1275); PL2, papain-like 
proteinase 2 (aa 1570-1867); X, X-domain which contains adenosine diphosphate-ribose 1 "-phosphatase (ADRP) (aa 1276-1435); TM, transmembrane domain; 
3CL, 3C-like proteinase; RdRp, RNA-dependent RNA polymerase; Z, zinc-binding domain; HEL, helicase domain; ExoN, exonuclease; N, nidoviral uridylate- 
specific endoribonuclease (NendoU); MT, 2 / -0-ribose methyltransferase (2'-0-MT). Domains Ac (aa 846-1058) and Y (aa 2310-2796) are described by Ziebuhr et 
al. (2001). The spike protein (1363 amino acids) of ECoV is represented by a black line (Top). The N-terminal signal peptide (amino acid residues 1-14 or 17), the 
heptad repeat 1 (HR1, amino acid residues 991-1902), the heptad repeat 2 (HR2, amino acid residues 1259-1304), the transmembrane domain (amino acid residues 
1308-1330), and the cytoplasmic domain (amino acid residues 1331-1363) are depicted. A potential cleavage recognition sequence (RRQRR) at residues 764-768 
and the predicted cleavage site between residues 768 and 769 are indicated. The generated cleavage products SI and S2 subunits are depicted. The positions of the 
receptor-binding domain on the S1 subunit and the fusion peptide on the S2 subunit are currently unknown. 


regions: the Ac domain and the region between the PL2 pi ° and 
the Y domain. The functional significance of these insertions and 
deletions is unknown as yet; however, the functions of PLl pro , 
PL2 pro , and ADRP are not anticipated to be affected since 


insertions and deletions are not located in the functional domains 
of these enzymes (Fig. 1). 

ORF2 (nt 21,610-22,446) of ECoV-NC99 encodes the 
predicted NS2 protein with 278 amino acids. The NS2 of 


Table 1 


Coding potential of the ECoV-NC99 genome sequence 


ORF 

Encoded 

protein 

Nucleotide position 
in the genome 

No. of 

nucleotides 

No. of amino 
acids (aa) 

mRNA used 
for expression 3 

5' Leader 


1-64 

64 



5' UTR 


1-209 

209 



ORFla 

ppla 

210-13,499 

13,290 

4429 

1 

ORFla/b 

pplab 

210-21,595 

21,386 

7128 

1 

ORF2 

NS2 

21,610-22,446 

837 

278 

2 

ORF3 

HE 

22,458-23,729 

1272 

423 

3 

ORF4 

S 

23,744-27,835 

4092 

1363 

4 

ORF5 

p4.7 

27,825-27,947 

123 

40 

5 

ORF6 

pl2.7 

28,076-28,405 

330 

109 

6 

ORF7 

E 

28,392-28,646 

255 

84 

7 

ORF 8 

M 

28,661-29,353 

693 

230 

8 

ORF9a 

N 

29,363-30,703 

1341 

446 

9 

ORF 9b 

I 

29,424-30,044 

621 

206 

9 

3' UTR 


30,704-30,992 

289 




a The mRNA used for expression of each protein is derived from the Northern blotting analysis and the comparison with other group 2a coronaviruses. See the text 
for details. 
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Table 2 


Predicted end-products of proteolytic processing of the ECoV replicase polyproteins ppla and pplab 


Cleavage 

product 

Nucleotide 

position a 

Polyprotein 

Position in 
ppla/pplab (aa) 

Length 

(aa) 

Putative funcitional 
domain(s) b 

Putative proteases predicted 
to release protein from 
polyproteins 

nspl 

210-941 

ppla/pplab 

lMet-Gly244 

244 


PLl pro 

nsp2 

942-2744 

ppla/pplab 

245Val-Ala845 

601 


PLl pro 

nsp3 

2745-8597 

ppla/pplab 

846Gly-Gly2796 

1951 

Ac, PLl pro , ADRP, PL2 pro , 

TM1, Y 

PL2 pro 

nsp4 

8598-10,085 

ppla/pplab 

2797Ala-Gln3292 

496 

TM2 

PL2 pro + 3CL pro 

nsp5 

10,086-10,994 

ppla/pplab 

3293Ser-Gln3595 

303 

3CL pro 

3CL pro 

nsp6 

10,995-11,855 

ppla/pplab 

3596Ser-Gln3882 

287 

TM3 

3CL pro 

nsp7 

11,856-12,122 

ppla/pplab 

3883Ser-Gln3971 

89 

Part of RNA binding 
hexadecameric supercomplex 

3CL pro 

nsp8 

12,123-12,713 

ppla/pplab 

3972Ala-Gln4168 

197 

Part of RNA binding 
hexadecameric supercomplex 

3CL pro 

nsp9 

12,714-13,043 

ppla/pplab 

4169 Asn-Gln4278 

110 

ssRNA-binding protein 

3CL pro 

nsplO 

13,044-13,454 

ppla/pplab 

4279Ala-Gln4415 

137 

2 zinc fingers 

3CL pro 

nspll 

13,455-13,496 

ppla 

4416Ser-Ser4429 

14 


3CL pro 

nspl 2 

13,455-16,237 

pplab 

4416Ser-Gln5343 

928 

RdRp 

3CL pro 

nspl 3 

16,238-18,034 

pplab 

5344Ser-Gln5942 

599 

ZBD, HEL 

3CL pro 

nspl 4 

18,035-19,597 

pplab 

5943Cy s-Gln6463 

521 

Exonuclease (ExoN) 

3CL pro 

nspl 5 

19,598-20,695 

pplab 

6464Ser-Gln6829 

366 

NendoU 

3CL pro 

nspl 6 

20,696-21,592 

pplab 

6830Ala-Ile7128 

299 

2 / -0-MT 

3CL pro 


Domains Ac and Y are described by Ziebuhr et al. (2001). 

a Nucleotide position means the location of the nucleotides encoding corresponding proteins in the entire genome of equine coronavirus-NC99 strain. 
b pL{P ro ? papain-like proteinase 1; PL2 pro , papain-like proteinase 2; ADRP, adenosine diphosphate-ribose 1 "-phosphatase (formerly known as ‘X-domain’); 
3CL pro , 3C-like proteinase; TM, transmembrane domain; GFL, growth factor-like domain; RdRp, RNA-dependent RNA polymerase; ZBD, zinc-binding domain; 
HEL, helicase domain; NendoU, nidoviral uridylate-specific endoribonuclease; 2'-0-MT, 2 / -0-ribose methyltransferase. 


ECoV has 67%, 67%, and 45% amino acid identity with the 
respective NS2 proteins of BCoV, HCoV-OC43, and PHEV. The 
lower amino acid identity with PHEV may be attributable to the 
fact that PHEV has a truncated NS2 protein (Vijgen et al., 2006). 
Sequence analysis revealed that the ECoV NS2 protein contains 
a domain (aa 46-135) with similarity to the putative cyclic 
phosphodiesterase (CPD, Martzen et al., 1999). The CPD 
domain has also been identified in the NS2 proteins of other 
group 2a coronaviruses as well as in the 3 'end of the ppla protein 
of toroviruses (Gorbalenya et al., 2006; Snijder et al., 1991, 
2003). The NS2 of ECoV was predicted to contain 9 potential 
phosphorylation sites. The NS2 of ECoV does not contain a 
signal peptide and is a non-secretory protein. The function of the 
NS2 protein in coronaviruses has not been studied in detail. It is 
known that the NS2 gene is non-essential for MHV replication in 
transformed cells (Schwarz et al., 1990). However, a recent 
study showed that a point mutation in the NS2 of MHV led to its 
attenuation in mice in spite of its wild-type replication in tissue 
culture (Sperry et al., 2005). 

ORF3 (nt 22,458-23,729) of ECoV-NC99 encodes the 
predicted HE protein containing 423 amino acids. Nine potential 
N-glycosylation sites were predicted. SignalP analysis revealed 
a signal peptide probability of 0.802 with a potential cleavage 
site between residues 17 and 18. It was predicted that the 
N-terminal 390 amino acids are located outside the cell surface 
or viral envelope with a transmembrane helix at amino acids 
391-413 and an internal domain at amino acids 414-423. The 
putative active site for esterase activity, FGDS (Kienzle et al., 
1990), is present at amino acids 36-39 of the HE protein in 
ECoV. 


ORF4 (nt 23,744-27,835) of ECoV-NC99 encodes the 
predicted spike (S) protein containing 1363 amino acids. Eighteen 
potential N-glycosylation sites were predicted. An N-terminal 
signal peptide was identified with a potential cleavage site 
between amino acids 14 and 15 predicted by SignalP-NN or 
between amino acids 17 and 18 predicted by SignalP-HMM. 
The ECoV S protein was predicted to be a typical type I 
membrane protein with the N-terminal 1307 residues exposed 
on the outside of the cell surface or virus particle, a 
transmembrane domain near the C terminus (residues 1308— 
1330), followed by a cytoplasmic tail (residues 1331-1363). 
Following multiple alignments with the S proteins of other 
group 2a coronaviruses, a potential cleavage recognition 
sequence (RRQRR) was identified at residues 764-768 which 
would predict a cleavage between amino acids 768 and 769, 
separating the ECoV S protein into SI and S2 subunits (Fig. 1). 
The ECoV S1 subunit is expected to contain a receptor-binding 
domain whose position has not yet been determined. The S2 
subunit is predicted to mediate membrane fusion. Two heptad 
repeat (HR) regions, which are conserved in position and 
sequence among the three groups of coronaviruses and play 
important roles in membrane fusion (see reviews of Eckert and 
Kim, 2001; Hernandez et al., 1996), were identified in the ECoV 
S2 subunit (HR1: aa 991-1092; HR2: aa 1259-1304) (Fig. 1). 
The ECoV S2 subunit is anticipated to possess a fusion peptide 
whose position is yet unknown. Some coronavirus S proteins 
have been shown to contain important neutralization epitopes 
(Godet et al., 1994; Kubo et al., 1994; Yoo et al., 1991) and 
mutations in the S protein have been associated with altered viral 
antigenicity and pathogenicity (Ballesteros et al., 1997; Bernard 
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Fig. 2. Northern blot analysis of intracellular RNA isolated from ECoV-infected 
HRT-18G cells. A DIG-labeled probe which was complementary to the 3' end 
(nt 30,660-30,946) of ECoV genome was used to detect the genomic and 
subgenomic mRNAs in ECoV-infected (lane 2) and mock-infected (lane 1) 
HRT-18G cells at 72 h p.i. 


and Laude, 1995; Dalziel et al., 1986; Gallagher and Buchmeier, 
2001; Leparc-Goffart et al., 1997). Whether the S protein of 
ECoV has such properties remains to be determined. 

ORF5 (nt 27,825-27,947) of ECoV-NC99 is predicted to 
encode a hypothetical protein of 40 amino acids with an 
estimated molecular weight of 4.7 kDa (termed p4.7 protein). It 
was predicted to be a non-secretory protein and did not contain 
any transmembrane helix. This protein is not closely matched to 
any known protein based on a search using BLASTP, PSI- 
BLAST, or FASTA programs. 


ORF6 (nt 28,076-28,405) of ECoV-NC99 is predicted to 
encode a protein of 109 amino acids corresponding to the BCoV 
12.7 kDa non-structural protein (pi2.7). This ORE overlaps by 
15 nucleotides with the ORE7 that encodes the E protein. No 
signal peptide or any transmembrane helix was present. No 
N-glycosylation site was found. 

ORF7 (nt 28,392-28,646) of ECoV-NC99 encodes the pre¬ 
dicted E protein containing 84 amino acids. No N-glycosylation 
site was identified. It was predicted to contain a signal anchor 
(probability 0.999). One transmembrane domain was predicted 
at residues 18-36 by TMpred analysis or at residues 15-37 by 
TMHMM analysis. Both programs predicted the N-terminus of 
the protein to be external to the cell surface or viral envelope. In 
the case of other coronaviruses, there is increasing evidence 
that the E protein together with the M protein is instrumental in 
viral assembly and budding; the cytoplasmic tails of both 
proteins have an important interactive role in this process 
(Corse and Machamer, 2000, 2002, 2003; Vennema et al., 
1996). 

ORF8 (nt 28,661-29,353) of ECoV-NC99 encodes the 
predicted M protein containing 230 amino acids. It was 
predicted to contain a signal anchor (probability 0.947). Three 
transmembrane domains were predicted to be present at 
positions 25-46, 57-78, and 81-102 by TMpred analysis or 
at positions 25-44, 49-71, and 81-103 by TMHMM analysis. 
The N-terminal 24 amino acid residues were predicted to be 
outside and the C-terminal 127 or 128-amino acid hydrophilic 
domain was predicted to be inside the virus. One potential N- 
glycosylation site was predicted at position 26 (NFS). The 
presence of potential O-glycosylation sites was predicted at the 
extreme N-terminus of the M protein (MSSTPTPAPGYT). 
Whether these sites are glycosylated or not needs to be ex¬ 
perimentally verified. Previous studies have shown that the M 
protein of group 1 and 3 coronaviruses (e.g. TGEV and IBV) are 
N-glycosylated, whereas the M protein of group 2 coronavirus 
MHV is only O-glycosylated (de Haan et al., 2002; Lai et al., 
2006). The M protein is the most abundant envelope component 
and plays a key role in coronavirus assembly by interacting with 
the E, S, N and HE proteins (Bosch et al., 2005; de Haan and 
Rottier, 2005, and references therein). 


Table 3 

Oligonucleotide primers used for RT-PCR amplification of the leader-body junction of sg mRNAs 


Primer ID Position Sequence (5'-3') 

22813N 22,792-22,813 GCGTTATCACC AGAAGCGGTGC 


25095N 

25,076-25,095 

29101N 

29,078-29,101 

30945N 

30,921-30,945 

IP 

1-21 

21982N 

21,958-21,982 

24283N 

24,262-24,283 

28100N 

28,078-28,100 

28334N 

28,312-28,334 

28641N 

28,617-28,641 

29016N 

28,992-29,016 

29710N 

29,691-29,710 


CGCCTATTCCAGGCAGAAGG 
GGC AGTAAG AGTAT GAT GGT CCTC 

CTGGGT GGTA ACTTAAC AT GCTGGC 

GATTGTGAGCGAATTGCGTGC 

GACGGGACTGACCAACTACACAACC 

GCGTGGTGACCCAATACCACTG 

TCCTCTCAGGTCTCCAGATGTCC 

CAGCCTCCTCTATAGTATTGGCG 

CGT CAT C C AC ATTA AGG AC T GGT GG 

GGGTTGAAACTCCACCAACTACCAG 

GCGTTGATTGCCATCGGCTG 


Use 

Reverse transcription for mRNA2 (NS2) and reverse 
primer for mRNA3 (HE) PCR 

Reverse transcription for mRNA3 (HE) and mRNA4 (S) 
Reverse transcription for mRNA5 (p4.7), 
mRNA6 (pl2.7) and mRNA7 (E) 

Reverse transcription for mRNA8 (M) and mRNA9 (N) 

Forward primer for all sg mRNA PCR 

Reverse primer for mRNA2 (NS2) PCR 

Reverse primer for mRNA4 (S) PCR 

Reverse primer for mRNA5 (p4.7) PCR 

Reverse primer for mRNA6 (pi2.7) PCR 

Reverse primer for mRNA7 (E) PCR 

Reverse primer for mRNA8 (M) PCR 

Reverse primer for mRNA9 (N) PCR 
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Fig. 3. ECoV sg mRNA leader-body junction and flanking sequences. The sg mRNA sequences are shown in alignment with the leader and the genome sequences. 
The genomic positions of the nucleotides in the leader and genome sequences are indicated. The start codon AUG in each sg mRNA is depicted in bold. Boxed regions 
are the putative TRS used for each sg mRNA synthesis. The 36N and 112N in the parenthesis mean that 36 and 112 nucleotides at that region are not shown. 
Homologous nucleotides between the leader and the mRNA or between the mRNA and the genome are indicated with connecting lines. 


ORF9a (nt 29,363-30,703) of ECoV-NC99 encodes the 
predicted N protein containing 446 amino acids. It was predicted 
to contain 36 potential phosphorylation sites. No signal peptide 
or any transmembrane helix was present. The N protein of 
coronaviruses has been shown to be multifunctional, e.g. 
interaction with the viral RNA genome to form a viral nucleo- 
capsid, interaction with the M protein, and the ability for self¬ 
association (Masters, 1992; Narayanan et al., 2000, 2003). 
Recently it has also been reported that the N protein may play a 
role in coronavirus replication (Almazan et al., 2004; Schelle 
et al., 2005). 

ORF9b (nt 29,424-30,044) of ECoV-NC99 encodes a 
hypothetical protein (I) containing 206 amino acids within 
ORF9a which encodes the N protein. It was predicted to contain 


10 potential phosphorylation sites. No signal peptide or any 
transmembrane helix was present. In the case of MHV, 
expression of the protein I has been detected in virus-infected 
cells but this protein is nonessential for viral replication and viral 
production (Fischer et al., 1997). 

Northern blot analysis of ECoV genomic and subgenomic 
mRNAs 

It is generally accepted that the replicase proteins are directly 
synthesized from the coronavirus genome, whereas the 
structural and accessory proteins are expressed from a nested 
set of subgenomic mRNAs. However, the number of sg 
mRNAs and the characteristics and expression pattern of the 
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proteins they encode (e.g. a sg mRNA may sometimes express 
multiple proteins) varies for each virus. In order to investigate 
ECoV sg mRNA synthesis, Northern blot analysis was 
performed to evaluate the synthesis of genomic and sub- 
genomic RNAs in ECoV-infected cells. A digoxigenin-labeled 
RNA probe complementary to the 3' end (nt 30,660-30,946) of 
the ECoV genome was used for a Northern blot hybridization 
analysis. As shown in Fig. 2, nine mRNAs were detected in 
ECoV-infected HRT-18G cells at 72 h p.i. Absence of such 
mRNAs in mock-infected cells confirms that these mRNAs are 
ECoV-specific. According to the estimated sizes of the mRNAs, 
it is reasonable to assume that sg mRNAs 2-8 express the NS2, 
HE, S, p4.7, pi2.7, E, and M proteins, respectively and that 
mRNA 9 expresses the N protein and probably the I protein as 
well. 

Determination of leader-body junction sequences of sg mRNAs 

There is a general agreement that the TRS elements 
determine the fusion sites of the 5' leader and the 3' body 
segments in coronavirus sg mRNAs. In order to determine the 
precise location of the leader and body TRSs used for ECoV sg 
mRNA synthesis, the leader-body junction and flanking 
sequences of each ECoV sg mRNA were determined using 
sg mRNA-specific RT-PCRs (see Table 3 and Materials and 
methods for details). The sg mRNA sequences were aligned to 
the leader and corresponding ‘body’ genomes as shown in Fig. 
3. Analysis of the leader-body junction sequences revealed 
that the core sequence of the TRS motifs is 5'UCUAAAC3'. 
The leader TRS (5 / UCUAAAC3 / ) and the body TRS (5 r 
UCUAAAC3') used for synthesizing HE mRNA, S mRNA, 
and N mRNA exactly match each other. There is one mismatch 
between the leader TRS and the body TRS (5'UCUAAAA3') 
used for generating the mRNA of the NS2 protein. There is 
also one mismatch between the leader TRS and the body TRS 
(5'UCCAAAC3') used for generating E mRNA and M 
mRNA. There are two mismatches between the leader TRS 
and the body TRS (5'UUAAAAC3') used for generating the 
mRNA of the p4.7 protein. Interestingly, in the case of the 
mRNA of the pi2.7 protein, the leader and the body segment 
is joined at the unusual consensus variant 5'UAAA- 
CUUUAUAA3'. Previously it has been shown that the 
mRNA of the pi2.7 protein of BCoV also utilizes an unusual 
consensus variant for joining the leader and body segment 
(Hofmann et al., 1993). From the sequence data, we conclude 
that the ECoV common leader on sg mRNAs is the first 64 
nucleotides of the ECoV genome. 


Phylogenetic analysis of ECo V 

Phylogenetic analyses of ECoV and other coronaviruses were 
performed based on the amino acid sequences of replicase 
polyprotein ppla, the ORF lb-encoded part of the pplab, S, E, 
M, and N. Phylogenetic analysis clustered coronaviruses into 
three major groups (Gl, G2a, and G3) irrespective of the gene 
used for analysis (Fig. 4). The SARS-CoV forms a separate 
branch and is classified as subgroup 2b (G2b) as suggested 
previously (Gorbalenya et al., 2004; Snijder et al., 2003). 
Phylogenetic analysis clearly demonstrated that ECoV falls into 
the cluster of group 2a coronaviruses and is most closely related 
to BCoV, HCoV-OC43, and PHEV. 

To further explore the possible evolutionary relationships 
among ECoV, BCoV, HCoV-OC43, and PHEV, the genetic 
distances of ECoV, BCoV, and PHEV to HCoV-OC43 were 
determined over the entire genome using the SimPlot analysis 
(Lole et al., 1999). As shown in Fig. 5, the BCoV strains and 
HCoV-OC43 had lowest genetic distances over the complete 
genome; the genetic distance between PHEV and HCoV-OC43 
was similar to the distance between BCoV and HCoV-OC43 in 
most regions of the genome with exception of the spike gene 
where the genetic distance of PHEV to HCoV-OC43 was 
significantly greater than the distance of BCoV to HCoV- 
OC43; the genetic distance of ECoV to HCoV-OC43 was 
significantly greater than the distance of either BCoV or PHEV 
to HCoV-OC43 in the regions of the first half of ORF la, the 
central part of ORF lb, NS2 and HE genes; the genetic 
distance with respect to the spike gene between ECoV and 
HCoV-OC43 was similar to the distance between PHEV and 
HCoV-OC43 but greatly higher than the distance between 
BCoV and HCoV-OC43. The genetic distances of BCoV and 
PHEV to HCoV-OC43 observed in this study are consistent 
with previously reported findings (Vijgen et al., 2005, 2006). 
Vijgen et al. (2006, 2005) concluded that PHEV diverged from 
the common ancestor before BCoV and HCoV-OC43. Our 
analysis suggested that ECoV had diverged earlier than PHEV 
from a common ancestor. In summary, ECoV had emerged 
earlier than PHEV, BCoV, and HCoV-OC43, notwithstanding 
the fact that ECoV was not isolated until 1999 from a diarrheic 
foal in USA. 

Conclusion 

In this study, we have determined the first complete genome 
sequence of ECoV and provided the first comprehensive analysis 
of the ECoV genome. Completion of the genome sequence of 


Fig. 4. Phylogenetic analysis of the amino acid sequences of replicase polyprotein ppla, the ORF lb-encoded part of the pplab, spike (S), envelope (E), membrane (M), 
and nucleocapsid (N) of ECoV-NC99. Multiple amino acid sequence alignments were carried out by using ClustalX 1.83 and the unrooted neighbor-joining trees were 
constructed using PAUP 4.0b 10. Bootstrap analysis was carried out on 1000 replicate data sets. CCoV, canine coronavirus (GenBank accession number D13096); 
TGEV, porcine transmissible gastroenteritis virus Purdue (NC_002306); FCoV, feline coronavirus (NC_007025); HCoV-NL63, human coronavims NL63 
(NC_005831); HCoV-229E, human coronavirus 229E (NC_002645); PEDV, porcine epidemic diarrhea virus CV777 (NC_003436); BCoV, bovine coronavirus ENT 
(NC_003045); HCoV-OC43, human coronavirus OC43 strain VR759 (NC_005147); PHEV, porcine hemagglutinating encephalomyelitis virus VW572 (DQ011855); 
MHV, murine hepatitis viruses A59 (NC_001846) and JHM (NC_006852); SDAV, rat sialodacryoadenitis coronavirus (AF207551); HCoV-HKUl, coronavims HKU1 
(NC_006577); SARS-CoV, SARS coronavims Tor2 (NC_004718); IBV, avian infectious bronchitis vims Beaudette (NC_001451). 
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Position (kb) 

Fig. 5. Genetic distance between ECoV, BCoV, PHEV and HCoV-OC43. The average genetic distances were calculated over the entire genome using the SimPlot 
program with a sliding window size of 400 bp and a step size of 200 bp. Each curve represents a comparison of the sequence data of ECoV-NC99, the BCoV strains, 
and PHEV-VW572 to the reference sequence data of the HCoV-OC43 ATCC strain VR759 (NC_005147). The sequence data of the BCoV strains used for comparison 
are the 50% consensus sequence of six BCoV strains: BCoV-ENT (NC_003045), BCoV-Alpaca (DQ915164), BCoV-DB2 (DQ811784), BCoV-Mebus (U00735), 
BCoV-Quebec (AF220295), and BCoV-LUN (AF391542). The linear representation of the ECoV-NC99 genome was shown at the top of the diagram. 


ECoV will contribute to our understanding of this virus at the 
molecular level and also enrich the database of coronaviruses. 
The sequence data are expected to aid in the development of 
diagnostic and prophylactic reagents. The sequence data of 
ECoV-NC99 will also help identify and characterize other ECoV 
isolates and enhance our understanding of the molecular 
epidemiology of coronavirus. Neonatal enterocolitis is an 
economically significant disease for horse breeders. Further 
studies are needed to determine the prevalence of ECoV in¬ 
fection in equine populations and the relative role of ECoV as a 
cause of enteric disease in horses. 

Materials and methods 

Cells and virus 

The human rectal tumor cell line HRT-18G (American 
Type Culture Collection [ATCC, CRL-11663]) was grown in 
Dulbecco’s modified Eagle’s medium (DMEM) supplemented 
with 4 mM L-glutamine, 5% fetal bovine serum, and 
penicillin/streptomycin at 37 °C in the presence of 5% C0 2 . 
The equine coronavirus-NC99 (Guy et al., 2000) was 
propagated once in HRT-18G cells to produce the working 
virus stocks. 


Isolation of viral RNA, RT-PCR amplification and sequencing 

The complete genome of ECoV was determined by sequencing 
11 overlapping RT-PCR products encompassing the entire 
genome (nt 1-3615; nt 3446-5458; nt 4953-6600; nt 5497- 
9678; nt 9347-13,021; nt 12,451-15,736; nt 15,425-19,307; nt 
19,039-22,812; nt 22,566-26,390; nt 26,065-29,662; and nt 
29,363-30,992). Viral RNA was isolated from ECoV stocks using 
the QIAamp viral RNA mini kit (Qiagen). Viral RNA was first 
reverse transcribed with AccuScript reverse transcriptase (Strata- 
gene) following the manufacturer’s instructions. Then, PCR 
amplification was performed with proof-reading PfuUltra high- 
fidelity DNA polymerase (Stratagene) in a volume of 50 pi: 5 pi 
PfuUltra PCR buffer (1 Ox), 1.0 pi dNTP mix (10 mM each), 1 pi of 
each primer (20 pM), 2 pi cDNA template, 1 pi PfuUltra DNA 
polymerase, and 39.0 pi nuclease-free water. The reaction mixtures 
were incubated at 95 °C for 2 min, followed by 35 cycles of 
amplification at 95 °C for 45 s, 50-53 °C for 45 s, and 72 °C for 
4.5 min, with a final incubation at 72 °C for 10 min. The PCR 
products were gel-purified using QIAquick gel extraction kit 
(Qiagen). Both sense and anti-sense strands were sequenced using 
the Applied Biosystems Big Dye Terminator V3.0 sequencing 
chemistry on ABI 3730 DNA sequencers (Davis Sequencing 
Center). Partial genomic sequence (9487 nucleotides) of ECoV had 
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been previously determined by two groups (Guy et al., 2000, 
GenBank accession number AF251144; Wu et al., 2003, 
AF523846 and AF523850. H.Y. Wu, J.S. Guy, and D.A. Brian, 
unpublished data, AY316300). These regions were re-sequenced in 
this study. To determine the remaining genomic sequence of ECoV- 
NC99, initial RT-PCR and sequencing primers were designed 
based on multiple alignments of the genomes of BCoV (GenBank 
accession number NC_003045), HCoV-OC43 (NC_005147), 
PHEV (DQ011855), and MHV (NC_001846); additional primers 
were designed based on the results of the first and subsequent 
rounds of sequencing. All of the primer sequences are attached in 
the Supplementary Table SI. 

DNA and protein sequence analysis 

The nucleotide sequences were assembled and manually 
edited using CodonCode Aligner version 1.5.2 to produce the 
complete sequence of the viral genome. ORF analysis was 
performed using Vector NTI Advance 10 (Invitrogen). RNA 
secondary structures of 5' and 3' UTRs and the ribosomal 
frameshift signals were predicted using the MFOLD program 
with the default parameter settings (Mathews et al., 1999; Zuker, 
2003). Potential 3C-like protease cleavage sites were predicted 
using the NetCorona 1.0 server (Kiemer et al., 2004). Prediction 
of signal peptides and their cleavage sites was conducted using 
SignalP 3.0 server (Nielsen et al., 1997). Potential N-glycosyla- 
tion sites, O-glycosylation sites, and phosphorylation sites were 
predicted using NetNGlyc, NetOGlyc, and NetPhos, respec¬ 
tively (Blom et al., 1999; Julenius et al., 2005). Prediction of 
transmembrane domains was performed using TMpred (Hof¬ 
mann and Stoffel, 1993) and TMHMM server 2.0 (Sonnhammer 
et al., 1998). Protein similarity searches were performed using 
BLASTP version 2.2.16, PSI-BLAST against the Protein Data 
Bank (PDB) (Altschul et al., 1997; Schaffer et al., 2001) and 
FASTA version 34.26 against the uniprot protein database with 
the default parameter settings (Pearson and Lipman, 1988). 
Pairwise amino acid comparison was performed using EMBOSS 
Pairwise Alignment Algorithms with the default parameter 
settings (http://www.ebi.ac.uk/emboss/align). Multiple se¬ 
quence alignments were performed using ClustalX version 
1.83 (Thompson et al., 1997). Phylogenetic analysis and 
unrooted neighbor-joining trees were carried out using PAUP 
version 4.0b 10 with the default parameter settings. Bootstrap 
analysis was carried out on 1000 replicate data sets. The genetic 
distance between genomes was determined using the SimPlot 
version 3.5.1 (Lole et al., 1999). 

Analysis of viral RNA by Northern blotting 

One anti-sense RNA probe base pairing to the 3' end of the 
ECoV genome (nt 30,660-30,946) was developed to evaluate 
the synthesis of genomic and subgenomic RNAs in ECoV- 
infected cells by Northern blotting. The ECoV RNA was 
amplified using two primer pairs (forward primer 30660P: 5' 
AGCAGATGGATGATCCCCTC3'; reverse primer 30946N: 5' 
ACT GGGT GGTA ACTTA AC AT GCT G3') and the QIAgen 
One-step RT-PCR kit (Qiagen). The gel-purified RT-PCR 


products were cloned into a linearized plasmid vector with 
overhanging 3' T residues (pDrive Cloning Vector, Qiagen). The 
authenticity and orientation of the insert was determined by 
sequencing both strands of DNA with Ml 3 reverse and forward 
primers. Plasmid DNA was linearized with BamRl (Roche), 
phenol/chloroform extracted, ethanol precipitated, and resus¬ 
pended in nuclease-free water. A digoxigenin (DIG)-labeled 
RNA probe was prepared using the DIG RNA labeling kit 
(Roche) according to the manufacturer’s instructions. 

Intracellular RNA was extracted at 72 h p.i. from ECoV- 
infected HRT-18G cells using the RNAqueous-4PCR kit 
(Ambion). Northern hybridization with the DIG-labeled RNA 
probe was carried out following the protocols that had been 
previously described for equine arteritis virus (Balasuriya et al., 
2004). 

Determination of the leader-body junction sequence 

The leader-body junction sites of all ECoV sg mRNAs were 
RT-PCR amplified and sequenced. Briefly, intracellular RNA was 
extracted from ECoV-infected HRT-18G cells using the RNAqu- 
eous-4PCR kit (Ambion). Reverse transcription was carried out 
with an RT primer located downstream to the body TRS region in a 
sg mRNA (Table 3) using Superscriptlll reverse transcriptase 
(Invitrogen) following the manufacturer’s instructions. Due to the 
nested nature of sg mRNAs, such an RT primer also binds to the 
corresponding positions in all larger viral mRNAs, including the 
genomic RNA. Subsequently, cDNA was PCR amplified with a 
forward primer (IP) located in the leader sequence and a reverse 
primer located just upstream of the RT primer in the body of the 
mRNA (Table 3). Amplification was performed in a volume of 
50 pi: 5 pi PfiiTurbo PCR buffer (10x), 0.4 pi dNTP mix (25 mM 
each), 1 pi of each primer (20 pM), 2 pi cDNA template, 1 pi 
PfiiTurbo® DNA polymerase, and 39.6 pi nuclease-free water. The 
reaction mixtures were incubated at 95 °C for 2 min, followed by 
35 cycles at 95 °C for 45 s, 50-56 °C for 45 s, and 72 °C for 3 min, 
with a final incubation at 72 °C for 10 min. RT-PCR products 
corresponding to each mRNA species could be distinguished by 
size differences on agarose gel. PCR products were gel-purified and 
sequenced to obtain the leader-body junction sequences for each sg 
mRNA. 

Nucleotide sequence accession number 

The nucleotide sequence of ECoV was deposited in GenBank 
under the accession number EF446615. 
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